Matching LSI for Scalable Information Retrieval

نویسندگان

  • Rajagopal Palsonkennedy
  • T. V. Gopal
چکیده

Latent Semantic Indexing (LSI) is one of the well-liked techniques in the information retrieval fields. Different from the traditional information retrieval techniques, LSI is not based on the keyword matching simply. It uses statistics and algebraic computations. Based on Singular Value Decomposition (SVD), the higher dimensional matrix is converted to a lower dimensional approximate matrix, of which the noises could be filtered. And also the issues of synonymy and polysemy in the traditional techniques can be prevail over based on the investigations of the terms related with the documents. However, it is notable that LSI suffers a scalability issue due to the computing complexity of SVD. This study presents a distributed LSI algorithm MR-LSI which can solve the scalability issue using Hadoop framework based on the distributed computing model Map Reduce. It also solves the overhead issue caused by the involved clustering algorithm by k-means algorithm. The evaluations indicate that MR-LSI can gain noteworthy improvement compared to the other scheme on processing large scale of documents. One significant advantage of Hadoop is that it supports various computing environments so that the issue of unbalanced load among nodes is highlighted.Hence, a load balancing algorithm based on genetic algorithm for balancing load in static environment is proposed. The results show that it can advance the performance of a cluster according to different levels.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A resource aware distributed LSI algorithm for scalable information retrieval

Latent Semantic Indexing (LSI) is one of the popular techniques in the information retrieval fields. Different from the traditional information retrieval techniques, LSI is not based on the keyword matching simply. It uses statistics and algebraic computations. Based on Singular Value Decomposition (SVD), the higher dimensional matrix is converted to a lower dimensional approximate matrix, of w...

متن کامل

A MapReduce Based Distributed LSI for Scalable Information Retrieval

Latent Semantic Indexing (LSI) has been widely used in information retrieval due to its efficiency in solving the problems of polysemy and synonymy. However, LSI is notably a computationally intensive process because of the comput260 Y. Liu, M. Li, M. Khan, M. Qi ing complexities of singular value decomposition and filtering operations involved in the process. This paper presents MR-LSI, a MapR...

متن کامل

Improved Skips for Faster Postings List Intersection

Information retrieval can be achieved through computerized processes by generating a list of relevant responses to a query. The document processor, matching function and query analyzer are the main components of an information retrieval system. Document retrieval system is fundamentally based on: Boolean, vector-space, probabilistic, and language models. In this paper, a new methodology for mat...

متن کامل

Improved Skips for Faster Postings List Intersection

Information retrieval can be achieved through computerized processes by generating a list of relevant responses to a query. The document processor, matching function and query analyzer are the main components of an information retrieval system. Document retrieval system is fundamentally based on: Boolean, vector-space, probabilistic, and language models. In this paper, a new methodology for mat...

متن کامل

PeerVOIRE - Proposal for a Peer-to-Peer Semantic Information Retrieval System

The exponential increase in available data has led to an ever growing interest in information retrieval techniques. Lexical matching often fails due to synonymy and polysemy. Hence, semantic approaches have been searched for and one of these approaches is Latent Semantic Indexing (LSI). The inconvenient with LSI is its heavy computational needs. Leaving all this demand on one system is impracti...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017